Concept
Speech Recognition
Variants
Automatic Speech Recognition
Parents
Children
Atypical SpeechAudio Signal AnalysisNatural Language Generation (Natural Language Processing)Natural Language Generation (Speech Language Pathology)Speech Acquisition
72.1K
Publications
3.9M
Citations
110.6K
Authors
9.9K
Institutions
Dynamic Time Warping Alignment
1956 - 1985
The period centers on time alignment and similarity measures, with Dynamic Time Warping (DTW) enabling time-normalized matching and dynamic programming–driven time-warping guiding word-level alignment across utterances. Front-end representations grounded in Linear Predictive Coding (LPC) and cepstral analysis, together with formant and pitch estimation, yield compact, trainable features that support predictive coding and excitation modeling. Vector quantization and early statistical pattern recognition shape the ASR pipeline, while Hidden Markov Models (HMMs) begin to emerge for speaker-independent isolated-word recognition, shaping model-based approaches. Acoustic cues such as spectral formants, cepstral pitch, and voicing inform feature extraction and decision rules.
• Time alignment and similarity measures became the central paradigm for speech recognition, with Dynamic Time Warping (Dynamic Time Warping, DTW) enabling time-normalized matching and dynamic programming-based time-warping guiding word-level alignment across utterances. [9], [16], [7], [12], [10].
• Front-end representations grounded in linear prediction, cepstral analysis, and formant/pitch estimation provided compact, trainable features for recognition, enabling predictive coding and excitation modeling via Linear Predictive Coding (LPC) and cepstrum-based methods. [1], [19], [4], [5], [3].
• Vector quantization and statistical pattern recognition shaped early automatic speech recognition pipelines, with Vector Quantization (VQ) design, LPC-front ends, and emerging integration of Hidden Markov Models for speaker-independent isolated word recognition. [6], [17], [20], [8].
• Acoustic-phonetic cue research established spectral formants, cepstral pitch, and voicing cues as foundational signals for speech perception and recognition, informing feature extraction and decision rules. [5], [4], [15], [2].
Popular Keywords
Time-Delay Neural Network Era
1986 - 2001
Neural Sequence Modeling Emergence
2002 - 2008
Deep Neural Acoustic Modeling
2009 - 2015
Self-Supervised End-to-End Speech
2016 - 2024